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Abstract 

In  multiterminal  estimation  the  basic  theoretical  question  is  to  prove  the  existence  of  encod¬ 
ing  and  decoding  schemes  that  can  achieve  a  certain  rate  of  compression,  while  resulting  in  a 
particular  statistical  estimation  efficiency.  This  is  by  comparison  a  much  less  studied  field  than 
multiterminal  source  coding.  Essentially,  only  two  approaches  have  been  reported.  Zhang  and 
Berger  [3]  established  an  upper  bound  on  the  asymptotic  estimation  efficiency  under  certain 
rate-compatibility  constraints,  and  a  test  channel  constraint  referred  to  as  the  solvability  condi¬ 
tion .  Han  and  Amari  [2]  tightened  the  upper  bound  under  weaker  constraints  on  both  the  rates 
and  the  test  channel  distributions.  However,  their  bound  is  in  most  cases  prohibitively  complex 
to  compute.  Here  we  unify  the  two  approaches.  We  are  able  to  construct  an  upper  bound  that 
is  asymptotically  equal  to  Han  and  Amari’s  bound,  under  the  same  rate  compatibility  condi¬ 
tions.  Our  bound  is  valid  under  weaker  constraints  on  the  test  channels  than  those  of  Zhang 
and  Berger.  Moreover,  the  bound  is  easily  computed  for  most  source  distributions.  We  also 
present  a  new  geometric  interpretation  of  the  upper  bound  on  asymptotic  estimation  efficiency. 


1  Introduction 


Assume  that  the  sources  Xn  and  Yn  are  i.i.d.  according  to  p$(x,y),  where  6  is  a  (possibly  vector 
valued)  parameter.  We  assume  the  existence  of  an  estimator  9(Xn,Yn )  which  is  asymptotically 
unbiased.  We  assume  the  estimator  has  an  asymptotic  (co)variance  index  V (0)  where 

lim  nV(9{Xn,Yn))  =  V(6), 


a  quantity  that  depends  on  the  true  value  of  6.  If  we  restrict  the  transmission  rate  of  source  X  to 
R\  bits,  and  the  rate  of  source  Y  to  R2  bits,  how  much  estimation  efficiency  of  the  parameter  6  can 
we  hope  to  retain?  We  encode  the  data  strings  with  encoding  functions  /  and  g  respectively  and 
form  an  estimator  0(f(Xn),g(Yn)),  where  we  place  the  following  rate  constraint  on  the  encoding 
functions; 

b°g(ll/ll)  <  Ru  bog(IMI)  <  R2, 

where  ||.||  denotes  the  cardinality.  We  assume  that  the  estimator  0(f(Xn),g(Yn))  is  asymptotically 
unbiased  and  that  there  exists  a  (co)  variance  index 


lim  nV(0{f{Xn),g{Yn)))  = 
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The  compression  of  the  data  sources  leads  to  a  loss  of  information  about  the  parameter  9  such  that 

V{9\RuR2)  >V(0). 

However,  if  this  loss  of  estimation  efficiency  is  minor  compared  to  V (9)  itself,  we  can  conclude  that 
compression  does  not  seriously  affect  estimation. 

2  The  approaches  of  Zhang  and  Berger  and  Han  and  Amari 

To  prove  the  existence  of  encoding  functions  /  and  g  and  estimators  9  that  achieve  a  certain  covari¬ 
ance  index  V (9\Ri ,  R2)  for  given  rate  constraints,  two  approaches  exist,  by  Zhang  and  Berger  (1988) 
and  Han  and  Amari  (1995,1998),  respectively.  They  are  similar  with  respect  to  the  information 
theoretic  coding  arguments  used,  but  widely  different  in  the  approach  to  establishing  the  achievable 
covariance  index  of  a  given  code.  Han  and  Amari  provide  an  upper  bound  on  the  covariance  index, 
which  is  tight  if  the  optimal  coding  function  /  and  g  are  given.  The  bound  is  difficult  to  compute 
for  even  quite  simple  data  source  distributions.  Zhang  and  Berger  give  an  upper  bound  which 
exceeds,  or  equals  the  bound  of  Han  and  Amari.  They  place  stronger  constraints  on  the  encoding 
functions.  Even  so,  their  bound  is  simple  to  compute  and  applies  to  continuous  data  sources. 

The  existence  of  encoding  functions  /  and  g  that  provide  ’good’  estimates  for  a  parameter  0, 
are  proven  via  universal  coding  arguments.  For  simplicity  we  will  restrict  the  discussion  to  discrete 
data  sources.  Recall  that  the  data  type  is  used  to  denote  the  relative  frequencies  of  each  letter 
outcome  in  a  data  string.  Thus  if  data  source  X  is  distributed  over  alphabet  {1, 2, ....,  M}  the  type 
of  the  data  string  Xn  is 

t(Xn)  =  (X]1{^=i}>IZ1{^=2}>‘‘‘>X!1{^=w})/n' 

?=1  i=  1  i—1 

We  can  introduce  joint  types  for  Xn,  Y"n ,  and  conditional  types  in  a  similar  fashion.  The  standard 
approach  in  information  theory  to  proving  existence  of  effective  codes  is  to  introduce  auxiliary 
variables,  which  form  a  collection  of  codewords  for  each  data  source.  These  auxiliary  variables  or 
codewords  U  and  V  are  generated  according  to  the  “test  channel”  distributions  defined  by  pe(u\x) 
and  p$(v\y).  The  auxiliary  variables  and  the  data  sources  thus  form  a  Markov  chain; 

U  —)■  X  Y  V. 

The  test  channel  distributions  depend  on  0  only  through  the  marginal  distributions  pe(x)  and  p$(y) 
respectively.  Since  the  true  value  of  9  is  unknown  we  approximate  the  test  channels  by  ptx(u\x) 
and  ptY(v\y),  where  tx,ty  are  the  marginal  types  of  the  data  sequences  Xn  and  Yn.  We  construct 
a  large  set  of  such  codewords  for  each  data  type  and  use  a  random  mapping  assignment  to  members 
of  the  codebooks.  This  is  the  first  step  of  encoding.  It  can  be  shown  that  the  rate  constraints  map 
into  restrictions  on  the  test  channel  distributions.  Zhang  and  Berger  (1988)  proved  the  existence  of 
codes  /,  g  under  the  rate  constraints  imposed  by  the  random  codebook  mapping  with  exponentially 
decaying  encoding  error  probability; 

R1>I(U;X),R2>I{V-,Y). 

Han  and  Amari  (1995,  1998)  showed  that  these  rate-compatibility  constraints  can  be  weakened  by 
adding  a  second  step  of  encoding.  The  regular  encoding  argument  (A  ->•  U,  Y  -4  V)  is  followed 
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by  the  binning  of  codewords  U  and  V,  and  minimum-entropy  decoding.  The  resulting  constraints 
on  the  test  channels  can  be  expressed  through  the  following  inequalities; 

Rx  >  I(U;X\V),  R2  >  I(V\ Y\U),  Rx  +  R2>  I(U,V-,X,Y). 


r2 


I(V;Y) 

I(V;Y|U) 

I(U;X|V)  I(U;X)  R 

Figure  1:  Rate  compatibility  conditions  for  the  methods  of  Zhang  and  Berger  (ZB),  Han  and  Amari 
(HA). 

Given  encoding  functions  /  and  g  we  can  construct  an  estimator  8(f(Xn),g(Yn)).  Here  the 
encoding  functions  are  either  the  result  of  the  first  encoding  step  (Zhang  and  Berger),  or  the  first  and 
second  step  and  minimum  entropy  decoding  (Han  and  Amari).  We  want  to  determine  the  covariance 
index  of  the  estimator  0(f ,  g).  All  the  information  about  the  parameter  9  can  be  deduced  from  the 
observed  (and  decoded)  data  type  tuxYV-  Han  and  Amari  form  the  maximum  likelihood  estimate 
based  on  the  distribution  pe{tuxYV )•  By  construction,  the  test  channels  place  constraints  on  the 
data  type  tjjxYV *  Han  and  Amari  formulate  these  constraints  through  a  projection  operator  H  on 
the  space  of  data  types  (marginals  and  joint).  This  generalized  MLE  is  elegant  and  provides  an 
upper  bound  bound  on  estimation  efficiency  of  estimators  0(f(Xn),g(Yn))  given  a  rate  constraint 
;  -^2) )  be. , 

V(e\RuR2)  <Vha(0\Ri,R2)  =  (Fe(H))-1  +  0(71-^), 

where  Fe  is  the  Fisher  information  with  respect  to  pe(tuxYv),  a  function  of  the  projection  matrix 
H.  However,  forming  the  projection  operator  H  is  no  small  feat,  even  for  such  limited  and  simple 
cases  as  binary  data  sources.  Computing  the  bound  on  estimation  efficiency  using  the  techniques 
of  Han  and  Amari  {Vha{Q\Ru  #2)),  is  therefore  prohibitively  complex  for  larger  source  alphabets, 
and  moreover  it  is  unclear  whether  this  approach  can  be  extended  to  continuous  data  sources,  even 
in  an  abstract  form. 

Zhang  and  Berger  (1988)  construct  their  bounds  on  estimation  efficiency  by  computing  the 
ensemble  mean  and  variance  over  the  randomly  generated  codebooks  (first  step  of  encoding) .  Their 
argument  is  only  valid  for  additive  estimators,  i.e.  estimators  such  that 

0(Xn,Yn)  =  ±'£0(Xi,Yi) 
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holds.  Moreover,  they  assume  that  an  additive  estimator  based  on  the  encoded  data  can  be  obtained 
as  the  solution  to  the  linear  equation  system 

J2pe(u\x)pe{v\y)8(u,v )  =  0(a,y),  Vx,y. 

u,v 

This  puts  a  very  limiting  constraint  on  the  test  channel  distributions,  but  the  additivity  of  the 
estimator  9(f(Xn),g(Yn))  follows.  For  such  additive  estimators,  a  repeated  random  coding  argu¬ 
ment  ensures  the  existence  of  encoder  functions  and  estimators,  whose  means  and  variances  come 
arbitrarily  close  to  the  ensemble  quantities.  Zhang  and  Berger  thus  avoid  the  construction  of  the 
data  type  distribution.  In  fact,  computation  of  the  efficiency  bounds  only  requires  moments  under 
distribution  pg(UXYV )  (in  contrast  to  pe{tuxYV )  for  Vjja)-  Zhang  and  Berger’s  upper  bound  on 
the  asymptotic  efficiency  equals 


V(8\RuR2)<VZBmuR2)  = 

=  V{9(U ,  V))  +  E[E(8\X)f  +  E[E{8\Y )f  -  E[E{9\UX}2  -  E[E(9\VY f  +  0(ra"1/2). 

3  Extending  Zhang  and  Berger’s  approach 

The  limitation  of  Han  and  Amari’s  approach  lies  in  the  complexity  of  the  distribution  pe(tuXYV )• 
The  solvability  condition,  and  the  one-step  encoding  limits  the  approach  of  Zhang  and  Berger.  In 
order  to  construct  computable  efficiency  bounds  we  choose  to  extend  the  approach  of  Zhang  and 
Berger.  We  place  the  following  constraints  on  the  test  channel  distributions.  Assume  an  exponential 
family  source  distribution  p$(x ,  y).  Restrict  the  test  channels  po(u\x)  and  pe(v\y)  to  map  to  another 
exponential  family  distribution  pe(u,v)  such  that  Ie{U,  X)  >  0,/^(Vr,  Y)  >  0  holds.  Refer  to  the 
canonical  parameters  of  pe(u,  v)  as  77.  Note  that  77  is  not  equal  to  77',  the  canonical  parameters 
of  pe(%,y)-  However,  a  reparameterization  of  po(xyy)  with  parameters  77, 8  is  in  general  possible. 
In  e.g.  the  multinomial  case  77  may  correspond  to  linear  combinations  of  outcome  probabilities. 
Assume  the  existence  of  functionals  h  :  77  — »  6  (where  both  77  and  9  may  be  vector  valued)  such  that 

Vj,  ••  )l  <  00,  Vi,  i,  q  =  0,  ..,4. 

This  restriction  on  the  test  channel  distributions  is  stronger  than  Han  and  Amari  who  only  need 
assume  the  existence  of  bounded  and  continuous  first  order  derivatives  of  pe(u\x)  with  respect  to 
p$(x)  (and  similarly  for  y).  However,  it  is  weaker  than  solvability  condition,  and  can  easily  be 
verified  in  practice. 

We  compute  the  decoder  ensemble  moments  of  estimators  77,  which  are  additive  and  asymp¬ 
totically  efficient.  Using  the  decoder  ensemble  mean  allows  us  to  extend  Zhang  and  Berger’s  rate 
compatibility  region  (Fig.  1)  to  that  of  Han  and  Amari.  The  binning  of  codewords  and  minimum 
entropy  decoding  is  an  additional  random  step  over  which  to  average.  Under  the  rate  compatibility 
conditions 

Ri  >  I(U;X\V)}  R2  >  I(V]Y\U),  Ri  +  R2>  I{U,V\X,Y) 

the  encoder  and  decoder  ensemble  moments  are  asymptotically  equal.  For  estimates  of  the  canonical 
parameters  we  can  thus  establish  a  bound,  which  asymptotically  coincides  with  the  bound  of  Han 
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and  Amari  since  the  ML  estimates  are  indeed  additive  for  r].  For  now,  let  rj  be  the  parameter  of 
interest.  The  asymptotic  efficiency  bound  equals 

V(r}\RhR2)<VezB(v\Ri,R2)=  (1) 

=  V(n(U,V ))  -  E[E(fj(U,V)\X)  -  E(fj{U,V)\UX)f  -  E[E(f)(U ,V)\Y)  -  E(fi(U,V)\VY)f. 

The  test  channel  constraints  on  the  data  type  distribution,  which  in  Han  and  Amari’s  work  entered 
through  the  projection  operator  H,  is  now  featured  in  the  second  and  third  term  of  VezB •  The 
first  term  is  the  variance  of  the  estimator  based  on  encoded  information  when  these  constraints 
are  ignored,  i.e.  under  the  marginal  distribution  pv(U,  F).  The  second  and  third  term  reduces  this 
quantity  by,  as  we  see  below,  the  variances  over  the  constructed  test  channels. 


Figure  2:  Geometric  interpretation  of  the  asymptotic  estimation  efficiency  bound. 


We  can  rewrite  VezB  as  follows, 

VeZB  =  ELg[V(fj(f,g)\f,g)]. 

Ejtg  is  an  operator  that  corresponds  to  the  ensemble  mean  over  the  decoded  codebooks,  /  and  g. 
We  can  identify  the  terms  in  equation  (1)  by  those  of  a  variance  decomposition, 

VeZB  =  Elg[V(v(f,9)\f,9)]  =  vm,g ))  -  VLg[EW,  g)\f,  g)}  (2) 

The  conditional  expectation  g)  is  a  projection  operator  onto  the  orthogonal  codebook  compo¬ 

nents,  since  in  the  multiterminal  setup  the  codebooks  /  and  g  are  generated  independently.  Thus, 
the  second  and  third  term  of  equation  (1)  equal 

Vf,g[E(fl(f,  g)\f ,  9 )]  =  Vf[E(f>(f,  5)1/,  g)f]  +  Vg[E(f,(f,  g)\f ,  g)g], 

where  the  subscripts  /  and  g  denote  the  orthogonal  components.  A  geometric  illustration  is  shown 
in  Figure  2.  The  quantity  of  interest  is  the  ensemble  mean  of  the  variance  of  the  estimator, 
Ef,g[Y(v{f)9)\f>9)]-  Obviously,  the  marginal  variance  V(fj(f,g))  is  an  overestimate  of  this  quan¬ 
tity  since  the  codebook  construction  is  known  (i.e.  determined  by  0-rate  quantities).  Thus,  this 
overestimate  is  corrected  by  removing  the  portion  of  the  variance  of  f]  that  we  control,  i.e.  the 
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variance  over  the  constructed  codebooks.  With  independent  codebook  components  we  get  the 
expression  in  equation  (2). 

With  the  restriction  placed  on  the  test  channels  through  the  function  h  we  can  now  construct 
a  bound  for  an  estimator  6  using  the  bound  for  ??.  By  a  delta-method  argument  we  can  show  that 

V{0\RuR2)  <  VeZB{0\RuR2)  =  £  |$%)|2[V(ft(/,s))  -  Vf,g[E(rji(f,g)\f,g)fr 

i:dih(r})^0 

+  J2  (di,jh{v))[Cov(r)i{f,g),rjj(f,9))-Covfig[E{r)i(f,g)\f,g),E(rjj(f,g)\f,g)]  +  0(n-1/2). 

h3^ijh(ri)^0 

The  construction  of  the  covariance  bounds  proceeds  in  a  similar  fashion  to  the  above  and  is  left 
out  to  conserve  space. 

We  conclude  with  a  simple  example.  Assume  a  bivariate  Gaussian  model:  p${xn1yn)J  0  = 
(al)  Gyi  P)-  We  use  the  test  channels  p{u\x)  N( x,a2)  and  p(v\y)  N  (y,  a2 j)  which  maps  between 
exponential  families.  The  VezB  bound  still  applies  if  we  discretize  (U,  X,  Y,  V)  to  (U,  X,  Y,  V),  with 
the  number  discretization  levels  growing  with  the  sample  size  n  at  rate  0(na),  where  a  €  (|,1). 
With  a  in  this  range  the  marginal  types  can  still  be  transmitted  at  0-rate,  and  the  resulting 
summary  statistics  differ  from  those  of  the  continuous  distributions  by  no  more  than  0(n~1).  Let  p 
be  the  parameter  of  interest.  It  is  easy  to  verify  that  the  condition  on  h  applies  for  this  parameter 
and  the  given  test  channels.  We  compute  the  VezB  bound  as 

»v«k(/(nj(r)))  <  (1  -  +  (22Bl1_122fe1_1)+ 

2  1  1  1  1 
~p  ^22J?1  _  1  +  22fi2  _  1  +  22jR2(22i?i  -  1)  +  22Rl  (22R2  -  1) ' 

Zhang  and  Berger’s  bound  is  given  by 

nV(p(f(Xn),g(Yn))  <  (1  +  p2)  +  _  j-  +  ^  _  j)  +  x  22fl2  Zi)+ 

2/  1  .  1  v 

p  '■22Ri  -  1  +  22R?  -  1  ’ 

which  is  obviously  larger  than  VezB-  Their  bound  also  uses  a  suboptimal  full-data  estimator  as 
the  baseline  for  comparison,  which  is  reflected  in  the  first  term  (1  +  p 2).  The  interpretation  of  the 
bound  is  in  this  example  particularly  simple.  It  corresponds  to  estimating  the  parameter  p  from 
noisy  Gaussian  data  when  the  signal-to-noise  ratios,  or  equivalently  the  noise  variances,  are  known. 
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